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(57) Abstract 

A lost frame recovery technique for LPC-based systems employs interpolation of paiameteis from previous and subsequent good 
frames, selective attenuation of frame energy when the energy of a subframe exceeds a threshold, and energy tapering in the presence of 
multiple successive lost frames. 



BNSOOCIO: <WO__9966494A1J_> 



' :., . : j. .... . 



Codes used to identify Stated pai^ to the PCT 6ir^e firoiit page^ of FfemJ*lets:publiShing fii^tioiitf Applications 



under the PCX. 



AL 


A lbania . / ■ ' " ; 


. l^S U', 


AM 


Arnica 


FI 


AT 


Austria 




AU 


Australia 


-ga: 


AZ 


Azerbaijan 


GB 


BA 


Bosnia and Herzegovina 
Barbados 


-GE , - 


BB 


6H 


BE 


Belgium 


GN 


BF 


Burkina Faso 


GR 


EG 


Bulgaria 


HU 


BJ 


Benin 


IE 


BR 


Brazil- • Jr- ^ 




BY 


Belarus 


IS 


CA 


Canada - ^ : - ? (■ j , - ^ y. 




CF 


Centra] African R^blic 


' jp 


CG 


Congo . : . ; . , , 




CH 


Switzerland " - J .... , 


KG 


CI 


C6te d*Ivoiie 




CM 


Cameroon or 7^,'; ' " 




CN 


China 


KR 


CU 


Cuba ^ ^, . 




CZ 


Czech Republic 


LC 


DE 


Germany 


U 


DK 


Denmark 


LK 


EE 


Estonia 


LR 



Finland 

.France, r -* 
Gabon 

United Kingdom 



Ghana 
Guinea 
Greece' 
Hungary 
Ireland 
' Israel ' ' 
Iceland 

Jqxan 

. Kenya . . ,i 

Kyrgyzstari 
. Dernocratic People's 
' R^6lic'of Korea" 

Republic of Korea 

Kazakltan [ ' 

Saint Lucia 

Liechtenstein 

Sri Lanka 

Liberia 



C TO 



LT 

i-;-LU>,-i;-T 

LV 
MC 

MG 

ML 

^ MR^"- - 
MW 

NE 

NO 
NZ 

1: PL 

PT 

RU 
SO 
SE 
SG 



Les(^tk> 



•^LiJ^terjiJoiBrg *. : ^ 
Latvia 
Monaco 

R^blk'pf Moldov^ ; ; 

Madagascar 

The fptmer Yugoslav : 

Republic of Maceddhia 

Mali 

Mongolia . , 
Maufitluiia 
, Malawi 

'Mexicb^'.' ' . • -• ■.[ 
Niger 

Nttheirlarjds -iH >^ 
Norway 
New Zealand. 
Polihd ' ' ' 
Portugal 

Romania *: ' 7'. 

Russian Federation 

Sudan 

Sweden 

Singapore 



■■ SI -'^ Slovenia 
SK Slovakia 

;SN:. rri se^gii; , 

SZ Swaziland 

TD Chad 

' TG ^ i Togo 

TJ Tajikistan 

TM-f, . Turkmenistan 

TR;' Tbikcy 

TT ^ Trinidad and Tobago 

, UA .Ukraine 

UG ' Uganda 

US . United States of America 
UZ- Uzbekistan-""'- 
VN Vict Nam 

,YU , . Yugoslaviai;,' 

ZW Zimbabwe 



BNSIX)CID: <WO 9966494A1J_> 



wo 99/66494 



PCTAJS99/12804 



IMPROVED LOST FRAME RECOVERY TECHNIQUES FOR 
PARAMETRIC, LPC-BASED SPEECH CODING SYSTEMS 

Background of the Invention 

The transmission of compressed speech over packet-switching and mobile 
communications networks involves two major systems. The source speech system 
encodes the speech signal on a frame by frame basis, packetizes the compressed 
speech into bytes of information, or packets, and sends these packets over the network. 
Upon reaching the destination speech system, the bytes of information are 
unpacketized into frames and decoded. The G.723.i dual rate speech coder, described 
in ITU-T Recommendation G.723A^ *T)ual Rate Speech Coder for Multimedia 
Communications Transmitting at 5.3 and 6.3 kbit/s," March 1996 (hereafter 
"Reference 1", and incorporated herein by reference) was ratified by the ITU-T in 
1996 and has since been used to add voice over various packet-switching as well as 
mobile conmiunications networks. With a mean opinion score of 3.98 out of 5.0, (see, 
Thryft, A. R., "Voice over IP Looms, for Intranets in '98," Electronic Engineering 
Times, August, 1997, Issue: 967, pp. 79, 102, hereafter "Reference 2", and 
incorporated herein by reference), the near toll quality of the G.723.1 standard is ideal 
for real-time multimedia applications over private and local area networks (LANs) 
where packet, lo_ss_is mimmaL Howeyer, ovei: 3^dde area networks (WANs), global 
area networks (GA^Js);, a^d^jiiobil^ c^ congestion can be 

. severe, an4 packet Igs^ speech if left untreated.^rlt is:^ 

therefore necessary, to develop .techniques to reconstract lost speech frames at the 
receive: in order to minimize distortion; and maintain outptit m^ ' , 

The following di^cussiort oP the G.273.1 dual ;Tate ^,Qoder and its errpf 
coricfealmerit Will assi^t^^^^ J , ^ V , 

• . . r . i V-.' -:* - ' ' ; * * ' 

; .The, G.723.1 dual rateV spebch coder encodes* .16-bit linear pulse-code 
m.Qdulated (PCM) speech, sampled at a rate of 8 KHz, using linear predictive analysis- 
by-synthesis coding. The exeit^on for the high rate coder is Multipulse Maxinaum 
Likelihood Quantization (MP-KdLQ) w the excitation rfor the low rate coder is 
Algebraic-Code-Excited Linear-Prediction (ACELP). ^ The encoder operates on a 30 
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ms frame size, equivalent to a frame length of 240 sainples, and diyides every frame 
into four subframes of 60 samples each. For every 30 ms speech frame, a 10th order 
Linear Prediction Coding (LPC) filter i? computed and its cpefficients are quantized in 
the fonn of Line Spectral Pair (LSP) parameters for transmission to the decoder^ An 
5 adaptive codebook pitch lag and pitch gain ^re thpn calc«lated for qvery subframe and 

transmitted to the decoder. Finally, ^e excitation ?ignal, consisting, pf the fixed 
codebook gain, pulse positions, pulse signs, and grid index, is approximated using 
either MP-MLQ for tHe' high rate coder or'ACELiE^ for the' low " rate coder, and 
transmitted to the decoder. In siun, &e resting tlitsfream sem 'f^om^ encoder to 
1 0 decoder corisists 6f the LSP parameters,^ atd^ive codebook lags, fixed an^ adaptive 

codeb66k gains, pulsfc positions, pulse signs,' and the ^d in^ex. " 

At the decoder, the LSP parameters are decoded and the LPp synthesis filter 
generates reconstructed speech. For every subframe, the fixed and adaptive codebook 
contributions are sent td a pitch pdsifilt6r,^ whd^e output is ^ i^^^ to the L^C synthesis 
15 filter. Trie <mtpiitof sjoflhe^s filt^ then to'^ ifen^ gain 

scaling tmit to g6ii^te Ithe'sj^ is'the ciise of mdicated ]&ame 

' erasures, an ierr^or ddn^ekMerit strktegy; ^ Mowing subsecti^^^^ is 

provided. Figure T disfil^ys'abiockdialra^^ '^^ 

: P^ff^^^^-P^^^^i^^f losses, cun-em G.7^^^^^ c?^mSS^^s^cnt inyiolvcs 

20 two m^or steps. The first step is LSP ycqtor recovery ai^d .,t]tje_^?c9|id step is 

, ^^^i]^^ ^^ssing frame;'? LSP yector jis^recpyered by 

applying a fixed linear predictor to decoded I^P vectpn; .]^ the seeond 

step, the ™issmg fi^e's^ the, rec^t mfpimation 

available at the decoder. This is achieved by first detennining the previous frame's 
25 voiced/unvoiced classifier using a cross-cpTOlation maxin^ function; and rthen 

testing the prediction gain for tiie best, vector. If the gain is more thai^ 0.58 dB, the 

4eclare?d as unvoiced, . , The 
classifier thep retimis a value of 0 if the previou^ the estimated 

pitch lag if the previous frame is voiced. In tiie unvoiced case, the missing frame's 
30 excitation- is th^n gerxerated ii^ing a and scaled by 
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the average of the gains Tor subframes 2 and 3 of the previous frame. Otherwise, for 
the voiced case, the previous frame is attenuated by 2.5 dB and regenerated with a 
peiriodic excitation having a period equal to the estimated pitch lag. If packet losses 
continue for the next two frames, the regenerated excitation is attenuated by an 
additional 2.5 dB fdr each fr'aine," but "after three interpolated frames, the output is 
completely muted, as described in Reference 1 . 

The G.723.1 error concealment strategy was tested by. sending various speech 
segments over a network w;itii paxdket loss .levels of 1%, 3%, 6%, 10%, and 15%. 
Single as well as multiple packet losses, were simulated for each level. Through a 
series of inforaial listening .tests, it , was shown that although the pyerall ou^ut . quality 
was very good for lower levels of packet loss, a number of problems persisted at all 
levels arid'becaine iricreasihgly'sevCT^ packet loss increased. 

First, parts of the output segment sounded unnatural and contained many 
annoying, metallic-sounding artifacts. The unnatural sounding, quality of; the output 
can be attributed to LSP yectpr recovery, based. pn,,^^a 

described. Sinqe the mssin^^fiame's LSP vector is reppyered by applying a fixed 
predictor to the jjreviou? file's ^L^P yectpr,^^^ 

previous and reconstructed frames are not smooth. As a result of the failure to 
generate sinootii'^^^^^ firames, unnatural sounding output 

' quality dccui^;' which increases tininteliigibility during levels of packet loss. In 
■ additibh, m^y ^^^^^ were heard m tihie output. 

These inetallic-soundirig aMfaHs primarily occur in unvoiced regions of the output, 
' and are caiised by incorrect voicing estiin of the previous franie during excitation 
^ f^ecpveiy: In otiier wbirds, smce a inissihg, unvoiced frame may incorrectly be 
classified as voiced, then t^^itibn into the missing frame will generate a iugh- 
frequency glitch, or'metalUc-sbimding art^ by applying the estmated pitch lag 
computed xbr the pi-evibus frame'T As packet loss increases, this problem becomes 
even more severe; as intorrect voicing estimation generates increased distortion. 

Another problem using G.723.1 error concealment was the presence of high- 
energy spikes in the output. These high-energy spikes, which are especially 
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' :.i^i^>?°/^^°^^^^® i^?^ <5avsed by .incprrect^ estmatipn of the; LPQ coefficients 

during fonnant postfiltering, dpe to popr predictjiQn of the- LSP^ or gain parameter, 
using G.723.1 fixed LSP prediction and excitation recovery. Once again, as packet 
loss increases, the number of lugh-eriergy spikes also ' increases, leading to greater 
listener discomfort ahd distortion. " - ' " ■ - 

Filially, "choppy" spee;ch, resulting froni, corpplete muting of the output, was 
evident. Since G:723. 1 errpr conceiOment i:econstructs no mpEc than three consecutive 
missing firames, all remaining missing fr^ine^ arej^iniply rnm to patches of 

silence in the output, or "choppy" speech. Since there is a greater probability that 
more than three consecutive packets may be lost m a network^ when packet loss 
increases, this will lead to increased ^'ciidpp/' \sp'eech " and '"hence, •'■decreased 
intelligibility and distortion at the output. . 

It is an object of'the preseirt, invention- to elimm 
imprpve upon tiieverrorxonceahnerit strategy, defined in' Rdfereiice lir Thi^ and other 
obj ects are achieved -by jan improved last ffi-ame. recovery; technique employing linear 
interpolation, selectiverenergy atteauationj'and energy, lap;ering^.^ = ; : ; i r , 

' Linear inteipomfi6ti<?f th^'^^eei^i fnddef to 
sindoth qpebtral ehan^^s' 4bto^s 'irame e^asui-^s '^d"^^^ elMLkte '^y unfiatural 
sounding speech and metallic-sounding artifacts firom the^^Su^'ueXmeSt rnte^^^ 
operates as follows: l> At-the deboderj at bjuffer is mtrdduced to store a fiiture speech 
fi-an^e or.packet TTie prejv^^tiftaBid f^tupe m^^ .in the. buffer are used to 

interpolate the speech mp4el .par^eter^.f9r, ;|he ini^sing frame, thereby generating 
smoother spechal changes across missing frames than if a fixed predictor were simply 
used, as in G.723.i error concealment, 2) Voicing classification is theii based on both 
tlie< estiniatbd pitbh value and predibtor gmH for the '^revib^s^'frame, as opposed to 
simply the predictor gain' ^as In -0.723.1' e^^ this improves the 

probability of,coTreet:ypicing estimation for the.mis^^ the first 

part of the linear interpolation,technique, more natural-sounding speech is achieved; 

4 



9966494A1J_> 



wb 99/66494 



PCT/liS99/12804 



by applying the second part bf the linear interpolatibn technique, almost all uriwahted 
metallic-^oiinding artifacts ^e bfe ' 

To eliminate the effects of high-energy spikes, a selective, energy attenuation 
technique was developed. This technique checks the signal energy for every 
synthesized subframe against a threshold value, and attenuates all signal energies for 
the entire friame to an'acceptable level if the threshold is exceeded; ' Combined with 
linear intcfpoialion, this selective'energy Attenuation technique effectively eliminates 
all instances or high-energy spik 

Finally, an energy tapering technique was designed to eliminate the effects of 
"choppy** speech. VtTienever multiple packets are lost in excess of one frame, this 
technique simply repeats the previous good fi-ame for every missing franie by 
gradually decreasing the repeated frame's signal energy. By employing this 
technique, the energy of'^the "output sigjial is gradi^ smoothed or tapered over 
r multiple packet losses,vthus .eitinina':ing. aiiy patches of silence ior a "choppy" speech 
: effect evident iniGr723,I errctrzcoTyceaiment /mother. advantage of energy tapering is 
; : the i relatively . STr:all >amouritcof - coriputatioii % time; crequireti : for reconstructing lost 
packets. Compared; to G J723:±: exofe iconce^hneDti . si>jce this technique only: involves 
. gradual , attenuatm^^ jOpposed to 

.performing G.723J fixed LSP i>rediction and excitation recoyejy, the total algprithmic 
;^^^^,delay4s consi^^ , j.-^,.. -.-.r^* :- .r. • - ---^--.r- - 

/■ ':iW^^'^ r: I. .,;.-ui:^Bi:iefDescription^Xif: the Dram y-'^* ^■//■■u.-: 

^ ■ " ^ Tlie invention A^HU b^^^^ following description 

iribbhjilncti^^ 

Fig. 1 is a block diagram showing G.723-1 decoder operatipn- 

. , ,vFig .2 is ablock diagppi. iUiistratingfe^ use pf Future, Ready. Copy buffers 
. Jn the interpolation technique acc.ordi|ig to the present invention; .. .-^ , 

' v'*' : \ Figs. 3a-3c are wavefornas illustrating the elimination 6^ energy spiikes by 
the error concealment technique of tile preselnt inv - 
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; s , . Figs, 4a-4c are waveforms illustrating the eIiT^i^atipn^ofvputput muting by the 
error concealment technique according to the present inyentipn; 

Detaileii Descriptiqp of the Ij^ven : ; \ 

The present invention comprises; three techniq'u^^^ the 
problems discussed above that an se from vG,723 . X ^ ; error C9n,eealment, namely, 
unnatural sounding speech, metallic-sounding artifacts, high-energy spikes, and 
"choppy" speech. It should be noted that the described error concealment^ techniques 
are applicable to different types of parametric. Linear Predictive Coding (LPC) based 
speech coders (e.g. APC, RELP, RPE-LPC, MPE-LPC, CELP, SELP, CELP-BB, LD- 
CELP, and VSELP) as well as different packet-switching (e.g. Intemet, Asynchronous 
Transfer Mode, £md Frame Relay) and mobile coirmiiimication^ (e.g., mobile satellite 
and digital cellular) networks. Thus, while the ihventidtf will te described in the 
context of the G.723.1 MP-MLQ 6.3 Kbps codW^'oVer the Iiit^rnet, with the 
description using terminology asisociated ^^dth this paii^icular spee^ and 
network, the invention is riot '1(5 be sb liittited, but is reatfily' applicable to other 
pararftetiic, XJPC-bag^d spfeec^ coSei^ (e:g.;'the low fiM^CEtP cc^ as other 

siiriiiarcoder^ tody difS^nt^ ' v ^ - j ,n^ 

Linear Interpolation 

Linear interpolation of the ^speech model paranieters was developed to smooth 
spectral changes across a single fixmie erasiu-e (i.e. a niissing^ frame in between two 
good speech frames) and hence, generate more natural soxmding output while 
elfiniiiating any metalhc-sounalng'krfifkik'^ tlife^^>Wput.' The setup of the linear 
intexpoiation systehi illiistrateid m Figure 2. l^iriekr iintte^ three 
buffers - the Future Buffer, ileatcly^ B^ which is 

equiyalent to one 30 ros frame length: These buffers aire inserted at theireceiver before 
decoding and syiittesis t^ke^^^ it is first 

necessary to define the following terms as applied to linear interpolation: 

. w rpreviotis jrame,^^^^^^^^ frame that was processed by the decoder, and 

> r is stored in the Copy Buffer- ; ^ r vj^f 
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■ current frame]' is 'sL^^^g^ or missing frame that is currently' being processed by 
the decoder, arid is' stored in the Ready Buffer. • ■ " - " ' - ^ :^ 

future frame, is '^ gooci or missing frame' 'immediately following the current 
frame, and is stored in Ae-Fu^ : ^ ' 

Linear interpolation is a multi-step pfodedure that operates as follows:' ' ' 

1 . ^ The Ready Buffer stores the current good frame to be processed while 
the Future Buffer stores the ftiture frame of the encoded speech sequence. A 
copy of the current frame's speech model parameters is made and stored in the 
Copy Buffer. 

. 2. , The status of the future If 
, tlie fiiture frame is good, no. linear, int^^rpolatipn is. necessary; and the linear 
interpolation flag; ^ is^ reset tp,0. ; If ,&e future frame is - missing, linear 
^ . ^ interpplation^ rnight be. necess^; ,a.rid, the,, linear. ^ interpolatioa. flag is 
^ ^ tempprarily-set to L . (In a real-time sy;st^em,,.^a, niissing frame is detected by 
, ^ either a, receiver timeout pr Cyclical Redu^danQy C^ (GRC) failure, These 
missing frame detection algorithms hQwexer,. are not p.art pf thevinvention, but 
must be recognized and incorporated at the decoder for proper operation of any 
packet reconstmction strategy.) 

3. The current frame is decoded and synthesized. A copy of the .current 

I- t;r /^fl arv'' ir^. u r;/.';^; ■'^li -t^^.v.- v' '-^^ 

fimne's LPC synthesis filter and pitch postfilfered excitation are made. 

, . 4. , TJie. future, frame, originally ;in theXuture Buffer, becomes, the current 
frame aiid is stored in. Ready Buffer, frhe.next frame in the encoded speech 
. V sequence arrives as . . . - * , ~ 

; 5. The valuta of tiie linear !iiteii)olation flag is= checked." If the flag is set to 
^ r 0, th3 process Junipsrback to stfep (1). If tlie flag is^ set to: 1 , the process jumps 
to step (6);*:". \ . i-Jkic?.*-. ':<''rA , a>* o'^v" :r: v*,.''\;-o; . rvr..:;::-'^ 

;r : 6. ^The status of the* ftitme frame is deterinined. If the- fu^are frame is 
good, linear interpolation is applied; the linear interpolation flag remains set to 
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1; and %e processjumps to step (7),^ If the foture /rame missing, energy 
u . : .^?P9ring -is applied; the energy tapering, flag^^^ and .the linear 

interpolationflag is reset to 0. (Note: The energy tapering technique is applied 
only for multiple frame losses and will be described laterherein. ) . 

5 7. LSP recovery is performed. He^e, t^^^^ 

previous and future good frames, stpred in the Copy ^and Future Buffers 
respecliyely, a^e averaged to obtain the LSP vector fpr th^ current frame. 

: 8 . Excitation jeGovery is performed; ^ H^te- thfe; fixed ebdebbok gains from 
- : the. previous and future frames^ stored in: thfe Copy^ and Future Buflers, are 
10 ^ raveraged lo ^obtain the; fixed codebook / gain^ for* tlie rhissing frkrheV ^ All 

remaining speech model parameters are taken^ from the previous frame. ' 

9. Pitch lag and predictor gain estirnatioh are pCT^ tlie previous 

frame,:stored in the Copy Buffer^ with- the idemical prbcedtire error 
■■- concealment ^..r^ ^'■} j^^'} q^.x .j;:7r . v,:-r-;> .... Ju^^h./:^^ 

15 lb. ' If &e ^i^ictoi^ g^ 

imVbiced, 4hd the excitatibh signal jfof the current frame is generated lisiiig a 
random liiimbbr getifera^^^ 

,1 1; ^ . If.the predictor gain is ^eater than 0,58 dB arid th^e estimated pitch lag 
20 _ exceeds a threshold value Pthr^h, the. fra^ is declared .vpi^^^ 

, . signal for the ciirrent frame is generated ^b first ^ attenuating the 

. previous excitation by 1.25 dB for eveiy^ two pybfraines, and then regenerati^^ 
this excitation with a period equal to Ae^ estimated pitc Otherwise^ the 

, c^ent frame is declared ,imyoiced m recovered, as , in step 

25 (10). ' ^' ' ' ^ 

J i Aft^: LSP and excitation frame, with its liewly 

r interpolated aiid synthesized -and the 

, • processjumpsbackto step (13): : fi^n :^^ bay ^.i.iy t - k C 
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i 3. ' Thei fiitiird ^&aine,' originally in the Future TBuffer, becomes the current 
frarrie and is stored in the Ready Buffer. The next frame in the encoded speech 
sequence arrivek^afe the future* frarrie in the Future Buffer The process then 
returns to Step (1). ^ ^ 

5 > > There' are 'at ieast two important advantages of linear interpolation over 

G.723.1 error conceabiieht: The ''first advantage occurs in step (7), during LSP 
recovery. In Step (7),'since linear interpolation determines the missing frame's LSP 
, parameters, based on -tfed previous and jfuture frames, this provides a better estimate for 
the .missing; 'frame's .LSP parametei-s, -thereby eriabling . smoother spectral changes 

10 _ . across the missing/feame, Ijian if fixed LSP prediction were siinply used, as' in G.723 . 1 
error cpncealxnent. .As, .a result, more natural sounding, intelligible speech is 
generated, thereby increasing comfortabilit^ : „ . 

■}0^: .Kl TherSecQnd advantage?^ of linear ihterpclation occurs.in steps (8) to (1 1), during 
excitation recovery. First, in step (8), since linear interpolation: generates the missing 
1 5 frame's gain parameterp, by. ayeragjng the fixed CQdebook |;ains be^veen Jhe previous 

and future frames, it provides a .better, estimate for .^t^^ gain, as 

opposed to. the techriic^ue described in G.723 .1 errgr cpncealment. This interpolated 
gain, which is then applied for unvoiced frames in step (10),. thereby generates 
smoother, more comfortable sounding gain transitions across frame erasures. 
20 ' ' Secoiidiyi in step voicing classi6catioh is based on ttie both the predictor gain 
' ' and estimafed pitch lag, as opposed to the 'predictor gain alone, as in (3.723.1 error 
cbnbealment.' That is^filiSme's whose p^^ greater than 0.58 dB are also 

comparddi against a threshbict pitch lag, PthresW- Siiice unvoiced frames are primarily 
composed of Ifi^-freqiiehty s^^^ those frames that have low estimated pitch lags, 
25 and hence, high Wirdited pitcfr frequeiicies, thereby have a higher probabiUty of 

being unvoiced. Thus, frames whose estimated pitch lags fall below Pthresh are 
declareid unvoiced and ttiose:^whose estimated pitch lags exceed Pthreshr declared 
0 - • Voiced: In sum, by selectively deterniining a frame's voicing 'classification based on 
both the predictor gain and estimated pitch -Tag, ttie technique of this invention 
30 effectively masks away all occurrences of high-frequency, metallic-soimding artifacts 
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mP*"^^;'" ^ '"^^^ T"^?^ ?^^^'1\8!^'^^^>^^??4> listener comfortability is 

increased. 

Selective Energy Attenuation , . 

. , Selective, energy ,&ttenu^tiG^ was developed to eliminate instances of high- 
energy spikes.heard using 0.723.1 eiTop, concealment. ^ Referring r to Figure 1, these 
high-energy spikes are caused by. ipcorreet estimation of the LPC coefficients during 
foimant post-filtering, due . tq. poor pr?dictipi|t. 9f the., I^P. .or g^in parameters by 
G.723.1 eiTor conceahnent. To provide l?etter estji^ates fo frame's LSP 

and gain parameters, linear interpolation was developed as previously described. In 
addition; the signal i^iiergy for every synthesized subframe. afti' fo postfiltering, 
is checked against a threskold energy, ghresh; if tbe signal'^ergy for any one the four 
subfiames exceeds Si^^h, then the ^igrial energies for ail reniaining subframes are 
. attenuated to ah acceptable energy level; S^;ix:'" e6nlbin6d ^^^ interpolation, 
this selective energy attenuation teehiuqu^ effectively eiumn^t^aif-ifistsindes of high- 
energy spikes, without adding noticeable degradation to the outp,it> XD^erall, speech 
inteUigibjUtyand^gjecj^yy, U^^ Figure 3b. shows the 

. ;Presen9e of a hjgb-enejgy §pike d^^ 3^ shows 

elimipatiwi of the higb-er^ linear 
interpolation. , , , , 

20 Energy Tapering .(v ; 

v rEnergy tapering-wasrdeydbped to= eliiniM^ speech 
generated brG.723.l enror conceahiient. - AS refc^Uedi ^^^dppy'^ ^^h ^ults when 
G.723,1 error co^ce^en^ poinpletdy mijtes,ttip,qutputr after are 
''^^^^''^^^^^ at;#e output, thereby 

^^^'^'^ ^^^"f ^f'*! To eliminate tlusprobl«^^ 

^ "^'^^ ^^^^P^g to l4ure 2, this 

technique operates as follows: 



15 



25 



h , 7^^ ^^K^.^, .'^y"?^^ g°9d frame to be processed while 

tKe Future Buffer stores' the foture fi^ne of the encoded speech sequence. A 
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' ' * ct>py of the ciknreht'frkme's's^ model parameters is made and stored in the 
Copy Buffer. 

2. The status of the future frame, either good or missingV is determined. If 
o thci-fjUire fraTr.e is good; nc - linear interpolation is necessary; " the linear 

5 interpolation: is reset to O. If the future frame is missing, linear interpolation 

might -be neceasaiy; the linear ihterp61atibri flag is temporarily set to 1. 

3. ' The current frame is decoded and synthesized. A copy of the current 
frame's LPC syhthesls liiter'and pitch postfiltered excitation is made. 

4. The future frame, originally in the Future Buffer, becomes the current 
10 frame and is stored in the R^ady Buffer. The next frame in the encoded speech 

sequence arrives as the future frame in the Euture Buffer, 

V *i>; or ^- The.value^of the linear interpolation flag is checkedi If the flag is set to 
- i;> -;^:i-r - :9^. **??. P^9p.??s Jumps back^^ flag is set to, Iv the process jumps 

15 ^ • 6J' - 'The ^latus^ iS^f tHe future thfe future frame is 

good, linear mte^ the 
^. j : \ i : fiitiire fr'anie is 'Trdssing^^ener^^ is 
set to 1, the linear interpolation flag is reset to 0, and the process jtnrips to step 

(7). Vv;.-v-r. 

20 rb.. 7^ ;. i ; -T^ exdts^tion, from 

. , • , 3tep (3),4s attenua^iedby,(^^^^ ' -v < 

^. ' '8: A *'The cx>py of the?pfe^ fi-oin istep (3), is 

. 1 M vrft xised to syrithesiite the ciirre^^^^ in step (7). 



9 



The future frame, originally in the Future Buffer, becomes the current 
trame and is stored m the Ready Buffer. The next frame in the encoded speech 
sequence arrives as the future frame in the Future Buffer. 

.r. . , ciifrent frame is synthesized using steps (7) to (9), then jumps to 



step (11). 
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-r l^; i/i^:^^^^^^^^ ^^f :^^^ f^^''^ ^i^^v^^ '^pt?"^»B?4- If , ttic iuturc frame ds 
r: V f^-l,"*" ^^^^'■^"?fS>^,^?Pe'^^g is is reset.tp 

, . . , 0, and the process jumps to step. (12). If thq futjire.frame is missing, fiirther 
energy .tapering is applied; ^he energy tapeqng ,flag.i§ incremented by 1, .and 
the process jumps to step (11).,. r 

.12. iThe future frame, [originally in the Future Buffer^>^beeoih6s the cxirreht 
frame and js stored in the Ready Buffer. The ne^^ 
, i sequence arrives as the future; frame in the Future Buffer. The i>rbcess juihps 
- -baek to step (1). mo ■ 1:\ ::h.-.'\y,:. -y.'.:^.: r: , ; ^ , , 



; By 'employing this technique, thren^^ of the out^m siign^ is "^^ 
tdpered over multiple iiacket losses, arid hence/ eiiminates the ^^^^^^ of "choppy" 
^speech by complete output miitirig: K^r4 4b shoWs the present of complete ou^)ut 
muting diie to G.723.1 eirbr ccoicealiiient; Mure 4c ^ws ' etimination of ou^t 
mutiiig 4ue to energy tapefring.; As Figure 4e illusfrates, the out|,'ut is^grakiually tapered 
over multipterpacket losses, thereby elinnnating:^^^ 

output and generating greater intem.gibmty:foV.theJiStenfef.J>^--'5r' --^ ^ - . ; . ; . ; i<. 

As discussed above, one of the clear advantages of energy tapering over 
G.723.1 error conceahnent, besides improved output intelUgibiUty, is the relatively 
lower amount of computation time required. Since energy tapering only repeats the 
previous frame's LPC synthesis filter and attenuates the previous frame's pitch 
postfiltered gain, the total algorithmic delay is considerably less compared to 
performing full-scale LSP and excitation recovery, as in G.723.1 error concealment. 
This approach minimizes the overall delay in order to provide the user with a more 
robust, real-time communications system. 

25 Improved Results of the Invention 

The three error conceahnent techniques were tested for various speakers under 
the identical levels of packet loss carried out using G.723.1 error conceahnent. A 
series of informal listening tests indicated that for all levels of packet loss, the quality 
of the output speech segment was significantly improved in the foUovang ways: First, 
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more' natural sounding speech' arid effective nniasking away of all "metallic-sounding 
^^artifacts-' were^a^^ spectral transitions across inissirig frames 

•based on linear interpolation and improved Vbicing.clajssification. Secondly, all high- 
^energy /-spikes ^were eliminated due to ^felective ' energy attenuation and linear 
5 interpolation. Finally, all instances of "choppy" speech were eliminated due to energy 

tapering. It is, irnportmt ^tOr:reaJizQ:that as: nefovork. congestion levels increase, the 
amount of packet loss.alsp. increases. ->Thus, in-order to maintairt' real-time speech 
, intelligibility, ..it-i is .essential ctQ ideN^elop- techniques to successfully- conceal frame 
erasures while minimizing the amoimt of degradation at the output: The strategies 
10 developed by the authors represent techniques which provide improved outnut speech 

quality, are most robust in the presence of frame erasures compared to the techniques 
described in Reference 1, and can be easily applied with any . parametric, LPC-based 
speech coder over any packet- switching or niobile comniunications netjyork. 

• ; ,, r5r ?t wjll^b^appreci^^^ that yan^us ehariges and modifications may be made to 
15 Ihe^sp^ecific^eipbqcim without departing from the spfrit atrid scope 

oftheiiivention as defined in jthf^j^ ^ ^> ; o v ; - : _ 
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What is Claimed Is: 

1. ; V ,;,/L;method of recovering a iQst .feame in a system of ^the lype .wherein 
information is-transinitted as successive frames.of encoded signals and theiinformation 
is reconstructed from said encoded signals at a receiver, said -method comprising: ^ ■ 

storing encoded signals from a first frame prior to said lost frame; 

, storing encoded signals from at second frarrie subsequent to said lost 
frame; and . , \.r.^ ..■ ; -.v ".ev-r . . 

interpolating betW^feft 'the encoded signals fi^dm'saici first and second 
frames to bbfaih recovered encoded signdii^'fo^^^ : > - 

2. A method according to claim 1, wherein said encoded signals include a 
phirality of Einfe Spectral Pair <LSP) patamaer^ e6rr^ each frame, and said 
interpolating step comprises^ interp^latiilg bei^& tiie^^^ 

fi^eahd tfelj^Ppai^^ ^feebndifram^; ^ • "'^ ^ v-ru-:,:. .-.y-- 

3. A method according to claim 2, wherein in reconstmcting said 
infbimatibn said receiver' d fran^ a^ voiced^or and wherein 
said receiver further MdsSciMe^- W 

frame, said method comprising the step of classifying said lost frame as voiced or 
^voiced ip accordance yaA sm gain for said first 

fi:Bme. . ^ ^ .. . 

4. A method according to eidiii 1 , whef em each' frame inclucies a plurality 

of subframes^ saia method comprising the step of compariiig a energy for each 

subframe of a particular frame against a threshold, and atteiiuati^g signal energies for 

-^^ft^^^ in smd particuj^ fr^me if tljp signal ?nesrgy^ in ^any subframe exceeds 
said threshold. 
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5. A method according to claim 1, wherein on loss of multiple successive 
i frames, said method comprises^ the step of repeating the encoded signals for a frame 
rinimediately :preceding :5aid multiple, successiv fr^es while gradually f educing -the 
signal .energy- foTieach recovered frame.^ : ^ - - * : . o. 

r .;6. j r A method according to claim 2; wherein said encoded signals include 
said LSP parameters, fixed codebook gains and further excitation signals, said method 
comprising interpolating ^said , fixed codebook gain . of. said lost frame from the fixed 
codebook gains of said first and second , frames, and adopting said fiirther excitation 
signals from said first frame as the further excitation signals of said lost frame. 

; ; , 7 . '^r , A ^?^pd of recpyenng a, los| frame ;in a system of the- type wherein 
information is transmitted as successive frames of encoded signals and the information 
is reconstructed from said encoded sigr^al^; at a receiver, said,method comprisingrv 

calculating an estimated pitch value and predictor gain for a first frame 
prior to said lost frame; and 

. ... ^^?i^y?|?g said lost fraipe; as voiced or .unvoiced, in; accordance with 
, said predictor gain and estimated pitch value from said first frame. . : 

n . . :^ .g^ '1; A niet&od recovering a fo^^ 

information is transmitted as successive frames of encoded signals, eacK frame 
including plural subfirames, and the information is reconstructed from said encoded 
.sisals at a receiver, s?udm^ . - . 

i '/^e • ; V : - cQmpai^g a /Signal .energjr/ subframe of a* particular frame 

roi agsdn^ta threshcl v / ;^ : v ; : : j / .^^. I - 

v v %f • attenuating signal fof airsut)fi:ames in smd piriiculaf fiBme if 

the signal energy in any subframe exceeds said threshold. ' ' 
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